AITopics | bad local minima

Collaborating Authors

bad local minima

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adding One Neuron Can Eliminate All Bad Local Minima

Neural Information Processing SystemsNov-20-2025, 22:43:45 GMT

One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima. In this paper, we study the landscape of neural networks for binary classification tasks. Under mild assumptions, we prove that after adding one special neuron with a skip connection to the output, or one special neuron per layer, every local minimum is a global minimum.

bad local minima, eliminate, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Add feedback

Adding One Neuron Can Eliminate All Bad Local Minima

Neural Information Processing SystemsOct-8-2024, 18:12:36 GMT

bad local minima, neural network, special neuron

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Reviews: Genetic-Gated Networks for Deep Reinforcement Learning

Neural Information Processing SystemsOct-8-2024, 05:24:42 GMT

The authors propose a new RL framework that combines gradient-free genetic algorithms with gradient based optimization (policy gradients). The idea is to parameterize an ensemble of actors by using a binary gating mechanism, similar to dropout, between hidden layers. Instead of sampling a new gate pattern at every iteration, as in dropout, each gate is viewed as a gene and the activation pattern as a chromosome. This allows learning the policy with a combination of a genetic algorithm and policy gradients. The authors apply the proposed algorithm to Atari domain, and the results demonstrate significant improvement over standard algorithms. They also apply their method to continuous control (OpenAI gym MuCoJo benchmarks) yielding results that are comparable to standard PPO.

algorithm, deep reinforcement learning, genetic-gated network, (12 more...)

Neural Information Processing Systems

Genre: Research Report (0.38)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.58)

Add feedback

Reviews: Adding One Neuron Can Eliminate All Bad Local Minima

Neural Information Processing SystemsOct-7-2024, 21:47:10 GMT

This phenomenon is a bit curious and perhaps deserves more elaboration. This, I am afraid, is likely what is going on here (if you drop the separable assumption). The main contribution of this work is to prove that by adding a single exponential function (directly) from input to output and adding a mild l_2 regularizer, the slightly modified, highly nonconvex loss function does not have any non-global local minima. Moreover, all of these local minima actually correspond to the global minima of the original, unmodified nonconvex loss. This surprising result, to the best of my knowledge, is new and of genuine interest.

assumption, local minima, minima, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

5b69b9cb83065d403869739ae7f0995e-Reviews.html

Neural Information Processing SystemsMar-13-2024, 16:47:35 GMT

Review of "Low-rank matrix reconstruction and clustering" This paper contributes a new algorithm for low-rank matrix reconstruction which is based on an application of Belief Propagation (BP) message-passing to a Bayesian model of the reconstruction problem. The algorithm, as described in the "Supplementary Material", incorporates two simplifying approximations, based on assuming a large number of rows and columns, respectively, in the input matrix. The algorithm is evaluated in a novel manner against Lloyd's K-means algorithm by formulating clustering as a matrix reconstruction problem. It is also compared against Variational Bayes Matrix Factorization (VBMF), which seems to be the only previous message-passing reconstruction algorithm. Cons There are some arguments against accepting the paper.

algorithm, application, approximation, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.55)

Add feedback

Mildly Overparameterized ReLU Networks Have a Favorable Loss Landscape

Karhadkar, Kedar, Murray, Michael, Tseran, Hanna, Montúfar, Guido

arXiv.org Artificial IntelligenceMay-30-2023

We study the loss landscape of two-layer mildly overparameterized ReLU neural networks on a generic finite input dataset for the squared error loss. Our approach involves bounding the dimension of the sets of local and global minima using the rank of the Jacobian of the parameterization map. Using results on random binary matrices, we show most activation patterns correspond to parameter regions with no bad differentiable local minima. Furthermore, for one-dimensional input data, we show most activation regions realizable by the network contain a high dimensional set of global minima and no bad local minima. We experimentally confirm these results by finding a phase transition from most regions having full rank to many regions having deficient rank depending on the amount of overparameterization.

activation region, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.1951

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Maximum-and-Concatenation Networks

Xie, Xingyu, Kong, Hao, Wu, Jianlong, Zhang, Wayne, Liu, Guangcan, Lin, Zhouchen

arXiv.org Machine LearningJul-9-2020

While successful in many fields, deep neural networks (DNNs) still suffer from some open problems such as bad local minima and unsatisfactory generalization performance. In this work, we propose a novel architecture called Maximum-and-Concatenation Networks (MCN) to try eliminating bad local minima and improving generalization ability as well. Remarkably, we prove that MCN has a very nice property; that is, \emph{every local minimum of an $(l+1)$-layer MCN can be better than, at least as good as, the global minima of the network consisting of its first $l$ layers}. In other words, by increasing the network depth, MCN can autonomously improve its local minima's goodness, what is more, \emph{it is easy to plug MCN into an existing deep model to make it also have this property}. Finally, under mild conditions, we show that MCN can approximate certain continuous functions arbitrarily well with \emph{high efficiency}; that is, the covering number of MCN is much smaller than most existing DNNs such as deep ReLU. Based on this, we further provide a tight generalization bound to guarantee the inference ability of MCN when dealing with testing samples.

artificial intelligence, machine learning, mcn, (17 more...)

arXiv.org Machine Learning

2007.0463

Country:

North America > Canada > Alberta > Census Division No. 13 > Woodlands County (0.24)
Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Global Loss Landscape of One-hidden-layer ReLU Networks, Part 2: Experiments and Analysis

Liu, Bo

arXiv.org Machine LearningJun-15-2020

The existence of local minima for one-hidden-layer ReLU networks has been investigated theoretically in [8]. Based on the theory, in this paper, we first analyze how big the probability of existing local minima is for 1D Gaussian data and how it varies in the whole weight space. We show that this probability is very low in most regions. We then design and implement a linear programming based approach to judge the existence of genuine local minima, and use it to predict whether bad local minima exist for the MNIST and CIFAR-10 datasets, and find that there are no bad differentiable local minima almost everywhere in weight space once some hidden neurons are activated by samples. These theoretical predictions are verified experimentally by showing that gradient descent is not trapped in the cells from which it starts. We also perform experiments to explore the count and size of differentiable cells in the weight space.

artificial intelligence, local minima, machine learning, (17 more...)

arXiv.org Machine Learning

2006.09192

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Adding One Neuron Can Eliminate All Bad Local Minima

LIANG, SHIYU, Sun, Ruoyu, Lee, Jason D., Srikant, R.

Neural Information Processing SystemsFeb-14-2020, 14:28:01 GMT

artificial intelligence, bad local minima, machine learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Understanding Global Loss Landscape of One-hidden-layer ReLU Neural Networks

Liu, Bo

arXiv.org Machine LearningFeb-11-2020

For one-hidden-layer ReLU networks, we show that all local minima are global in each differentiable region, and these local minima can be unique or continuous, depending on data, activation pattern of hidden neurons and network size. We give criteria to identify whether local minima lie inside their defining regions, and if so (we call them genuine differentiable local minima), their locations and loss values. Furthermore, we give necessary and sufficient conditions for the existence of saddle points as well as non-differentiable local minima. Finally, we compute the probability of getting stuck in genuine local minima for Gaussian input data and parallel weight vectors, and show that it is exponentially vanishing when the weights are located in regions where data are not too scarce. This may give a hint to the question why gradient-based local search methods usually do not get trapped in local minima when training deep ReLU neural networks.

artificial intelligence, local minima, machine learning, (16 more...)

arXiv.org Machine Learning

2002.04763

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback